Search and Rescue (SAR) missions in remote environments often employ autonomous multi-robot systems that learn, plan, and execute a combination of local single-robot control actions, group primitives, and global mission-oriented coordination and collaboration. Often, SAR coordination strategies are manually designed by human experts who can remotely control the multi-robot system and enable semi-autonomous operations. However, in remote environments where connectivity is limited and human intervention is often not possible, decentralized collaboration strategies are needed for fully-autonomous operations. Nevertheless, decentralized coordination may be ineffective in adversarial environments due to sensor noise, actuation faults, or manipulation of inter-agent communication data. In this paper, we propose an algorithmic approach based on adversarial multi-agent reinforcement learning (MARL) that allows robots to efficiently coordinate their strategies in the presence of adversarial inter-agent communications. In our setup, the objective of the multi-robot team is to discover targets strategically in an obstacle-strewn geographical area by minimizing the average time needed to find the targets. It is assumed that the robots have no prior knowledge of the target locations, and they can interact with only a subset of neighboring robots at any time. Based on the centralized training with decentralized execution (CTDE) paradigm in MARL, we utilize a hierarchical meta-learning framework to learn dynamic team-coordination modalities and discover emergent team behavior under complex cooperative-competitive scenarios. The effectiveness of our approach is demonstrated on a collection of prototype grid-world environments with different specifications of benign and adversarial agents, target locations, and agent rewards.
translated by 谷歌翻译
临时团队合作的进步有可能创建在现实世界应用程序中合作的代理商。但是,部署在现实世界中的代理人容易受到颠覆它们的意图的对手。在临时团队工作中,几乎没有研究对手的存在。我们解释了扩展临时团队工作以包括对手的存在的重要性,并澄清了为什么这个问题很困难。然后,我们提出了一些在临时团队合作中的新研究机会的指示,这会导致更强大的多代理网络物理基础设施系统。
translated by 谷歌翻译
我们调查攻击者的效果如何,当它只从受害者的行为中学习时,没有受害者的奖励。在这项工作中,当受害者的动机未知时,我们被攻击者想要行事的情景。我们认为一个启发式方法可以使用攻击者是最大化受害者政策的熵。政策通常不会被滥用,这意味着它可以通过被动地观察受害者来提取。我们以奖励无源勘探算法的形式提供这样的策略,可以在勘探阶段最大化攻击者的熵,然后在规划阶段最大化受害者的经验熵。在我们的实验中,受害者代理商通过政策熵最大化而颠覆,暗示攻击者可能无法访问受害者的奖励成功。因此,仅基于观察行为的无奖励攻击表明,即使受害者的奖励信息受到保护,攻击者的可行性也在不了解受害者的动机。
translated by 谷歌翻译
多智能经纪环境中的单代理强化学习算法不足以促进合作。如果智能代理商共同互动并共同努力解决复杂的问题,则需要计数器非合作行为的方法来促进多个代理的培训。这是合作AI的目标。然而,最近在对抗机器学习中的工作表明,模型(例如,图像分类器)可以很容易地欺骗制作不正确的决策。此外,在合作社的一些过去的研究依赖于陈述的新概念,如公共信仰,加快了解最佳合作行为的学习。因此,合作AI可能会引入以前的机器学习研究中未调查的新弱点。在本文中,我们的贡献包括:(1)争论由人类的社会情报启发的三种算法引入了新的漏洞,独一无二的合作益处,对手可以利用,并显示出对此简单,对抗的实验代理人的信念可能会产生负面影响。本证据表明了社会行为正式陈述的可能性易受对抗性袭击的影响。
translated by 谷歌翻译
复杂的系统展示了简单规则之后可以从结构或代理中出现的令人惊讶和美丽的现象。随着近期增强学习(RL)的成功,向前的自然道路将是使用多个深入RL代理商的能力来产生更高益处和复杂性的紧急行为。一般而言,由于多代理RL培训所固有的困难,这已被证明是一种不可靠的策略,而没有显着的计算。在本文中,我们提出了多王子RL中创造力的一些标准。我们希望这项提案将使艺术家施加多元经纪人RL A起点,并提供哲学讨论引导的进一步调查的催化剂。
translated by 谷歌翻译
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io
translated by 谷歌翻译
This work addresses cross-view camera pose estimation, i.e., determining the 3-DoF camera pose of a given ground-level image w.r.t. an aerial image of the local area. We propose SliceMatch, which consists of ground and aerial feature extractors, feature aggregators, and a pose predictor. The feature extractors extract dense features from the ground and aerial images. Given a set of candidate camera poses, the feature aggregators construct a single ground descriptor and a set of rotational equivariant pose-dependent aerial descriptors. Notably, our novel aerial feature aggregator has a cross-view attention module for ground-view guided aerial feature selection, and utilizes the geometric projection of the ground camera's viewing frustum on the aerial image to pool features. The efficient construction of aerial descriptors is achieved by using precomputed masks and by re-assembling the aerial descriptors for rotated poses. SliceMatch is trained using contrastive learning and pose estimation is formulated as a similarity comparison between the ground descriptor and the aerial descriptors. SliceMatch outperforms the state-of-the-art by 19% and 62% in median localization error on the VIGOR and KITTI datasets, with 3x FPS of the fastest baseline.
translated by 谷歌翻译
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.
translated by 谷歌翻译
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.
translated by 谷歌翻译
The computational complexity of the self-attention mechanism in Transformer models significantly limits their ability to generalize over long temporal durations. Memory-augmentation, or the explicit storing of past information in external memory for subsequent predictions, has become a constructive avenue for mitigating this limitation. We argue that memory-augmented Transformers can benefit substantially from considering insights from the memory literature in humans. We detail an approach for integrating evidence from the human memory system through the specification of cross-domain linking hypotheses. We then provide an empirical demonstration to evaluate the use of surprisal as a linking hypothesis, and further identify the limitations of this approach to inform future research.
translated by 谷歌翻译